Hierarchical Work-Stealing
نویسندگان
چکیده
dynamic load-balancing on hierarchical platforms. In particular, we consider applications involving heavy communications on a distributed platform. The work-stealing algorithm introduced by Blumofe and Leiserson is a commonly used technique to balance load in a distributed environment but it suffers from poor performance with some communication-intensive applications. We describe here several variants of this algorithm found in the literature and in different grid middlewares like Satin and Kaapi. In addition, we propose two new variations of the work-stealing algorithm : HWS and PWS. These algorithms improve performance by considering the network structure. We conduct a theoretical analysis of HWS in the case of fork-join task graphs and prove that HWS reduces communication overhead. In addition, we present experimental results comparing the most relevant algorithms. Experiments on Grid’5000 show that HWS and PWS allow us to obtain performance gains of up to twenty per cent when compared to the classical workstealing algorithm. Moreover in some cases, PWS and HWS achieve speedup while classical work-stealing policies result in speed-down.
منابع مشابه
A multithreaded scheduling model for solving the Tower of Hanoi game in a multicore environment
Modern computer systems greatly depend on multithreaded scheduling to balance the workload among their working units. One of the multithreaded scheduling techniques, the work-stealing technique has proven effective in balancing the distribution of threads by stealing threads from the working cores and reallocating them to the nonworking cores. In this study, we propose a new strategy that exten...
متن کاملHierarchical Work Stealing on Manycore Clusters
Partitioned Global Address Space languages like UPC offer a convenient way of expressing large shared data structures, especially for irregular structures that require asynchronous random access. But the static SPMD parallelism model of UPC does not support divide and conquer parallelism or other forms of dynamic parallelism. We introduce a dynamic tasking library for UPC that provides a simple...
متن کاملMultigrain Affinity for Heterogeneous Work Stealing
In a parallel computing context, peak performance is hard to reach with irregular applications such as sparse linear algebra operations. It requires dynamic adjustments to automatically balance the workload between several processors. The problem becomes even more complicated when an architecture contains processing units with radically different computing capabilities. We present a hierarchica...
متن کاملNested Parallelism in the OMPi OpenMP/C Compiler
This paper presents a new version of the OMPi OpenMP C compiler, enhanced by lightweight runtime support based on user-level multithreading. A large number of threads can be spawned for a parallel region and multiple levels of parallelism are supported efficiently, without introducing additional overheads to the OpenMP library. Management of nested parallelism is based on an adaptive distributi...
متن کاملOpenMP task scheduling strategies for multicore NUMA systems
The recent addition of task parallelism to the OpenMP shared memory API allows programmers to express concurrency at a high level of abstraction and places the burden of scheduling parallel execution on the OpenMP run time system. Efficient scheduling of tasks on modern multi-socket multicore shared memory systems requires careful consideration of an increasingly complex memory hierarchy, inclu...
متن کامل